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Abstract. The direct detection of Gravitational Waves (GWs) by Pulsar 
Timing Arrays (PTAs) is very likely within the next decade. While the stochastic 
GW background is a promising candidate for detection it is also possible that 
single resolvable sources may be detectable as well. In this work we will focus on 
the detection and characterization of single GW sources from supermassive black 
hole binaries (SMBHBs). We introduce a fully Bayesian data analysis pipeline 
that is meant to carry out a search, characterization, and evaluation phase. This 
will allow us to rapidly locate the global maxima in parameter space, map out 
the posterior, and finally weigh the evidence of a GW detection through a Bayes 
Factor. Here we will make use of an adaptive metropolis (AM) algorithm and 
parallel tempering. We test this algorithm on realistic simulated data that are 
representative of modern PTAs. 



1. Introduction 

In the next few years pulsar timing arrays (PTAs) are expected to detect gravitational 
waves (GWs) in the frequency range 10~^ Hz-10~^ Hz. Potential sources of GWs in 
this frequency range include supermassive black hole binary systems (SMBHBs) [T], 
cosmic (super)strings [2], inflation |5, and a first order phase transition at the QCD 
scale |3]. The community has thus far mostly focused on stochastic backgrounds 
produced by these sources, however; sufficiently nearby single SMBHBs may produce 
detectable continuous waves with periods on the order of years and masses in the 
range IO^Mq-IO^M© [51ini[7j. The concept of a PTA, an array of accurately timed 
millisecond pulsars, was first conceived of over two decades ago O [9]. Twenty 
years later three main PTAs are in full operation: the North American Nanohertz 
Observatory for Gravitational waves (NANOGrav; llQj), the Parkes Pulsar Timing 
Array (PPTA; [TT]), and the European Pulsar Timing Array (EPTA; [H]). The three 
PTAs collaborate to form the International Pulsar Timing Array (IPTA; [T3]). 

A significant amount of work has gone into the detection problem for continuous 
GWs from SMBHBs. Both [Mj and [15] use a Lomb-Scargle periodogram based 
approach to essentially measure the excess power that a continuous GW would induce 
compared to a noise only model. [16j developed a Bayesian framework aimed at 
the detection of GW memory in PTAs; however, the authors mention that the 
methods presented could be used for continuous GW sources as well. Most recently, 
a maximized likelihood based approach has been developed by [TT] [18] and was later 
extended to include multiple resolvable sources in [TO] . 
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Many authors have focused on determining the parameter accuracy that we may 
hope to extract from a future detection of a continuous GW from a SMBMB. [7j 
use an Earth-term only signal model to perform a study of SMBHB parameters that 
are measurable with PTAs using a Fisher matrix approach. [20j have developed a 
Bayesian Markov Chain Monte-Carlo (MCMC) data analysis algorithm for parameter 
estimation of a SMBHB system in which the pulsar term is taken into account in the 
detection scheme, thereby increasing the signal-to-noise-ratio (SNR) and improving 
the accuracy of the GW source location on the sky. Recently, [H] have developed 
parameter estimation techniques based on vector Ziv-Zakai bounds incorporating 
the pulsar term and have placed limits on the minimum detectable amplitude of 
a continuous GW source. In the aforementioned work, the authors also propose a 
method of combining timing parallax measurements with single-source GW detections 
to improve pulsar distance measurements. 

In this work we introduce a fully functional Bayesian pipeline aimed at both 
detection and parameter estimation of single continuous GWs. To this end, we make 
use of MCMC augmented with Parallel Tempering, an adaptive jump proposal scheme 
and thermodynamic integration for evidence evaluation. Previous work has made use 
the Fisher matrix or similar techniques to either estimate parameter uncertainties or 
propose jumps in an MCMC algorithm. Since it is known that the Fisher matrix 
is limited in use and only applies to large SNR (35], we choose to use an Adaptive 
Metropolis (AM) approach first developed in [231 [21] and later applied to cosmology 
and GW parameter estimation in [25l [Ml |2I| . 

The layout of the paper is as follows. In section [2] we introduce the signal model 
and notation used in this work. In section [3] we briefly review MCMC techniques, 
adaptive metropolis, parallel tempering, thermodynamic integration and introduce our 
likelihood function and priors. In section [4] we introduce the semi-realistic simulated 
datasets that we use to test our algorithm. In section [5] we test our algorithm 
on simulated data and make a few statements about the measurability of SMBMB 
parameters in realistic PTA data sets. Finally, we briefly mention future work and 
conclude in section [6l 

2. The signal model 

In general, pulsar timing residuals are defined as the difference of observed times-of- 
arrival (TOAs) of radio pulses and a deterministic timing model. In this section we 
will review the form of the residuals induced by a non-spinning SMBHB in a circular 
orbit and introduce our notation. The GW is a metric perturbation to flat space time 
defined in terms of its two polarizations as 

habit, tl) = e+ Cl) + e„\(f))/ix {t, Cl), (1) 

where ft is the unit vector pointing from the GW source to the Solar System Barycenter 
(SSB), , hx and e^i^ {A = +, x) are the polarization amplitudes and polarization 
tensors, respectively. The polarization tensors can be converted to the SSB by the 
following transformation. Following ^28) we write 

^Ibi^) = T^a-rhb - flaflb, (2) 
^abi^) = ^aflb + flafnb, (3) 
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Cl — —{smO cos,tf)x — {smO sanp)y — (cos0)z, (4) 

m = — (sin + (cos iy9)y, (5) 

n = — (cos cos — (cos6'siiiiy9)y + {s\n9)z. (6) 

In this coordinate system, 6 — t: /2 5 and ip — a are the polar and azimuthal angles 
of the source, respectively, where 5 and a are declination and right ascension in usual 
equatorial coordinates, where the North Celestial Pole is in the z direction and the 
Vernal Equinox is in the x direction. 

We will write our GW induced pulsar timing residuals in the following form: 

s{t, Cl) = F+{h)As+{t) + F"" (17)Asx (t), (7) 

where 

As^(i) =SA(ip)-SA(ie), (8) 

and te and tp are the times at which the GW passes the Eartl||]and pulsar, respectively, 
and the index A S {+, x} labels polarizations. The functions F^{d) are known as 
antenna pattern functions and are defined by 

F+in)^^^^^^^^'l^^^ (9) 
F^^n)= ^^-P^[^-P\ (10) 

where p is the unit vector pointing from the Earth to the pulsar. Also, from geometry 
we can write[§] 

tp^te- L{l + (l-p). (11) 

Given these definitions, we can write the GW contributions to the timing residuals as 
[Ml [20] 

s+ {t) = 3 - sin[2<i>(t)] (1 + cos2 cos 2^- 

— 2cos[2$(i)] cost sin 2V' 
sAt) = -F, TwrR - sin[2$(<)] (1 + cos^ l) sin 2iP 

+ 2cos[2$(i)] cost cos 2^^ 

where V' is the GW polarization angle and t is the inclination angle of the SMBHB. 
The orbital phase and frequency of the SMBHB are 

+ (14) 

J Technically, this is the time that the GW passes the SSB, however, following convention we will 
label this as the Earth time and will later refer to the Earth-teim, keeping in mind that, in practice, 
all variables are referenced to the SSB. 
§ Here we use geometrized units where G = c = 1. 
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and 

L.{t) = 0.0 (l - ^M'^'uj'Jh^ . (15) 

where $o a-nd ujq are the initial values at the time ol our first observation, the chirp 
mass is defined by = {mim2)^^^ /{mi + 777.2)"'^/^, where mi and m2 are the masses 
of the two SMBHs, and Dj^ is the luminosity distance to the source. We can relate the 
GW frequency to the orbital frequency of the binary by = 2u!o for circular orbits. 
Note that we use the observed redshifted values. For example, the chirp mass and 
orbital angular frequency in the rest frame are A^,. = A^/(l + z) and ujr = <^o(l + z), 
respectively, where z is the cosmological redshift. 

Eqs. [14] and [15] are true in general and can be applied when the frequency evolves 
appreciably over the total observing time. However, it is very useful to work under 
the assumption of slowly evolving binaries where Tchirp S> T, with T the observing 
time and 

T.,,, = = 3.2 X 10^ yr r ^) ( r^,^] , (16) 



uj ■ ' \10»MqJ VI X 10"** Hz 

where 



^4m'^'4'/'. (17) 



Since typical PTA observations are on the order of 10-20 years and T/Tchh-p ^ 10~^, 
this is a safe assumption for a broad range of masses and initial orbital frequencies of 
interest. With this approximation we can write the orbital frequency and phase for 
the earth term simply as 

$e(t) = *o+c^oi (18) 

LOeit)^0Ja. (19) 

However, for the pulsar term we are dealing with the retarded time of Eq. [TT] and 
must include the first order corrections to the orbital frequency and phase 

*p(^) = *o + i^at - LUoL{l + n-p) ~ LuL{l + Cl-p)t (20) 
ojp{t) ^ ujo - ^LC^ + ^ ■ p), (21) 

where L is on the order of a kpc and the last term in the pulsar phase containing uj 
terms is responsible for any frequency evolution over the earth-pulsar light crossing 
time. As we will se later, writing the pulsar phase in this way will become very useful. 



3. Implementation 

While Bayesian parameter estimation and model selection has been commonplace in 
LIGO and LISA 29^ 25. M. many PTA applications have been more 

frequentist in nature [IH |35j [36l |37| [151 IHl Ull US] and only recently has the Bayesian 
framework been put to use in the PTA context [MlIISlEnilMlllQlllllHSlEl]. Here 
we will briefiy review Bayesian inference for clarity of notation. In the Bayesian 
framework, the data d is assumed to be fixed and the parameters 6 that parameterize 
a hypothesis (or model) H are assumed to be randomly distributed. In this case, 



A Bayesian analysis pipeline for continuous GW sources in the PTA band 



5 



the data is used to update our prior knowledge of the hypothesis p{9, H) via Bayes 
theorem 

where p{9\d,'H) is the posterior probability distribution, that is, the probability that 
the set of parameters 9 for hypothesis H could generate the given data d. In the above 
expression p{d\9,'H) is the likelihood function, the probability that this dataset d is 
drawn from a random distribution described by hypothesis H and parameterized by 
9. Lastly, the prior p(9, %) encompasses any prior knowledge we have about the given 
hypothesis and p(dl'H) is the marginalized likelihood or evidence 

p{d\n) = j d9p{d\e,n)p{e,n). (23) 

For the purposes of parameter estimation we can safely ignore the evidence in Bayes 
theorem since it is just a normalizing factor that does not depend on the model 
parameters 9. However, if we want to perform model selection to claim a detection or 
compare different waveforms then the evidence is crucial. In this case we can make 
use of the Bayesian odds ratio between models and "S" 

^ p{d\nA) pjUA) . . 

p{d\'HB)p{'HBy ^ ' 

where the first factor is known as the Bayes Factor which is our confidence in one 
model over the other based on the data (henceforth we will denote the Bayes factor 
as S) and the second term is the prior odds ratio for models A and B which describes 
our prior belief in both models. In this paper we will only deal with the Bayes factor 
and assume that the prior odds ratio is unity. 

In the following sections we will briefly review Markov Chain Monte Carlo 
(MCMC) and describe some additional techniques used in our MCMC to help speed 
convergence and improve mixing. Finally, we will introduce our likelihood function 
and priors used in this work. 

3.1. Markov Chain Monte Carlo 

In this section we will quickly review the concept of MCMC. The appeal of MCMCs in 

general is that they sample directly from the posterior distribution and can efficiently 
explore the parameter space. The algorithm begins by specifying a point in some 
multidimensional parameter space x. This point can be chosen at random from the 
prior or can be initialized in some other way if we have additional information about 
the posterior structure. From here, we propose a "jump" to a new point in parameter 
space, y via a jump proposal distribution function q{y\x). We then evaluate the 
posterior at this new point and accept the jump with probability a = mm{l,H) 
where H is the Hastings ratio 

_ p{y\d)q{x\y) 
""^^^y- p{x\d)q{y\x)' ^ ^ 



The Hastings ratio is constructed to ensure the reversibility in subsequent steps in the 
chain, a concept known as detailed balance. We repeat this process for many iterations 
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until a convergence criteria is reached (i.e. autocorrelation time or Gelman Rubin R 
statistic [43j ) and the marginalized posterior pdfs of the parameters are simply the 
histograms of the parameter values in the chain. The choice of proposal distribution 
will be very important to achieve rapid convergence and we will explore this problem 
in the next section. 



3.1.1. Adaptive Metrpolis Here we will outline an adaptive metropolis (AM) 
algorithm [23] (hereafter HSTOl). As mentioned in the previous section the choice 
of jump proposal is very important for rapid convergence and adequate mixture of the 
chains. For this work we choose to make use of an adaptive scheme where the gaussian 
proposal distribution is updated using the past history of the chain. By using the full 
past history of the chain this algorithm is indeed non Markovian but it is shown 
in HSTOl that it retains the correct crgodic properties and thus will give unbiased 
samples from the posterior probability distribution. The algorithm is actually quite 
simple. First we use a multidimensional proposal distribution with diagonal covariance 
matrix Co = diag(eAstart), where for this work we choose e = 10~^ and Astait is drawn 
from our prior distribution. By using this covariance matrix we assure that the initial 
jumps will be small (likely to be accepted) and thus we will begin to build up points 
for later adaptation. After some number of iterations, ?7, (for this work we choose 
r] = 1000) the covariance matrix at iteration n becomes 

a = . . "-^ (26) 

I Sc;Cov(Ao, . . . , A„_i) n> rj and mod(n, r/) = 0, 

where Sd is a parameter that depends on the dimension of the problem and 
Cov(Ao, . . . , A„_i) is the sample covariance matrix at the nth iteration of the 
algorithm. HSTOl suggest a value of Sd = 2.4^/ndim, where ridim is the dimension 
of the problem, however we have found that we need to use a smaller value to obtain 
optimal acceptance ratios around 25% 44J. As shown in the above equation, we 
do not perform the adaptation at every iteration of the chain but instead update the 
covariance matrix every 77 iterations, which helps shorten the runtime of the algorithm. 
This adaptive method will help speed convergence as the jump proposal will begin to 
mimic the posterior and take into account any parameter correlations, however, in 
general, it will not help locate the global maximum of the posterior surface rapidly. 



3.1.2. Parallel tempering and thermodynamic integration A major problem with 
generic MCMC samplers is the tendency to get trapped in a local maxima. For a 
standard search it is unlikely that we will know a priori where the global maxima 
are located in parameter space, thus we must start our chain from a random point in 
the prior space. We want our algorithm to then quickly locate the global maxima in 
the parameter space. To accomplish this in a way that satisfies detailed balance we 
make use of parallel tempering. This technique involves different chains exploring the 
parameter space simultaneously, each with a different target distribution 

pmp)^p{6)p{d\ef, (27) 

where /3 < 1 is the inverse "temperature". This will essentially flatten out the 
likelihood surface allowing the chains to more freely explore the entire prior volume. 
The "hot" chains will inform the "colder" chains and vice versa by proposing parameter 
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swaps between different temperatures. A parameter swap between the zth and jth 
temperature is accepted with probabihty a — min(l, iJ), where the muhi-temperature 
Hastings ratio is 

^ ^ p{d\0,,PMd\OJ,fi^) (28) 
p{d\e,,pMd\0,.P,)' 

By swapping parameter states between different temperatures this ensures rapid 
location of the global maxima. The true posterior samples will come from the 
(3 = 1 chain but the higher temperature chains can be used to evaluate the evidence 
via thermodynamic integration (see e.g. |31j and references therein). Consider the 
evidence for a chain with temperature as part of a partition function 



Z(/3) = / d0p{d\9H,l3)p{9\H) 

(29) 

d0p{d\d,Hfp{0\H). 



Since the prior is independent of /3, we can take the log and integrate over /3 to obtain 



In p{d\n)= d(3{\np{d\e,n)}p, (30) 







where {in p{d\9,H)) is the expectation value of the likelihood for the chain with 
temperature 1//3. The expectation values are calculated over the post burn-in chains. 
In practice, it is important to choose a temperature ladder such that we explore the 
entire likelihood surface and recover the full integrand of Eq. |30| For example, if we 
expect (or have injected) a signal with a given SNR, then the highest temperature 
chain will decrease the SNR by a factor ^ l/^/T, therefore; if we expect SNR = 10, 
then a maximum temperature of Tmax = 100 should be sufficient. However, we may 
need to use much higher temperatures to ensure the that above integral has converged 
[31] . In this work our temperature ladder is constructed to be exponentially spaced 
with maximum temperature Tmax = SNR^. 



3.1.3. Jump Proposals As mentioned in section 3.1.1 we use an AM scheme to 
update the covariance matrix for multidimensional gaussian jumps. However since 
our parameter space is quite large (8 + iVpsr) we do not always update all parameters 
simultaneously. In ~70% of jumps we will jump in subsets of correlated parameters 
such as the sky location parameters and pulsar distance as well as the chirp mass 
and distance. In ^-^20% of jumps we update all parameter simultaneously and in the 
remaining ~10% of jumps we choose one parameter at random and propose large 
jumps in parameter space. 

In order to ensure proper mixing and exploration of our chains we have chosen 
to expand the parameter space in the following way. If wc introduce the initial pulsar 
phase 

4>p = ujqL{1 + h ■ p) (31) 
and then solve for the pulsar distance 

'^^'^ + (32) 



uJi}{l + VL-p) cJo(l + f^-p) 
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where n is the number of times the phase has wrapped aromid 27r (typicaUy 1000s). 
By writing the distance to the pulsar in this fashion we can separate out the very 
smah scale fluctuations (L*'™'*") that are important for coherence and are typically 
less than a pc, and the large scale fluctuations (L^^^) that are on the order of a 
kpc are important for determining the frequency evolution of the binary. These two 
components are essentially independent and explain physics on vastly different scales. 
So now re- writing Eq. [20]we have 

<^p{t) = ^0 + ^^ot - (pp - ujL{1 + Cl ■ p)t, (33) 

where we jump in both (j)p and L sa L^^^. 

3.2. Likelihood and Priors 

Following [ini m] we write the pulsar timing residuals in the linear approximation as 

5t = MS^ + n + s, (34) 

where St are the timing residuals, AI is the design matrix, 6$, is the parameter offset 
between the true pulsar timing parameters and our best fit parameters, n is the noise 
present in the TOAs (radiometer noise, red noise, etc.), and s is our continuous GW 
signal. Since n is assumed to be gaussian we can write the likelihood function for a 
single pulsar as 

exp \-^{St-s- MS^f (St -s- M6^) 

« ^ — vmynsd -■ <'-^' 



where 9 are parameters that describe the noise in the pulsar residuals and A the 
parameters that characterize the continuous GW signal. It was shown in |3D] that 
this likelihood function can be marginalized over the timing model parameters 5^ to 
obtain 



exp 



iiSt- sY G{G^CG)'^G^ {St - s) 



p(St\e,\) = 1 , (36) 

V(2^)"-™det(G^CG) 

where G is an n x (n — m) matrix with n the number of TOAs and m the number of 
fitted parameters in the timing model. The derivation of G can be found in |40| and 
will not be explored here. We can think of the matrix G"^ as a projection operator 
that projects our data St onto the null space of M, that is, it projects the data into 
a subspace orthogonal to the timing model fit. In this way we have fully taken into 
account the timing model fitting procedure. 

For this work we will assume that the noise parameters 6 are know from some noise 
estimation done beforehand (see e.g. |40L I45p and will only focus on characterizing 
the continuous GW parameters A. We will also assume that the residuals between 
pulsars are uncorrelated. In other words, we are assuming that the stochastic GW 
background will be negligible compared to the intrinsic noise in each pulsar. In general 
this is not likely to be a good assumption when we would expect a detection of a 
single GW source. The effects of omitting the correlations in the likelihood function 
are unknown and will be the subject of future work. Under these assumptions, the 
likelihood function for the full PTA can be written as 

p{5t\X)=l[p{St^\Xo,), (37) 
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where 5ta and Aq and the residuals and model parameters for the ath pulsar, 
respectively. Since we are assuming the noise is fixed (and known) then we can write 
the log-likelihood ratio of a model with a single continuous GW to a model with just 
noise as 

1 



lnA=^ {5to,\s{K\ 



s{Xa)\s{\a) 



where the inner product between two time-series x and y is 

{x\y) = x'^G{G'^CG)-^G'^y. 



(38) 



(39) 



We choose flat priors on all angular parameters and flat priors in the log of the 
chirp mass, luminosity distance, and frequency of the GW. For the pulsar distance 
prior we use the current electromagnetic (EM) measurements either from timing 
parallax or Very Long Baseline Interferometry (VLBI) to contain the prior space as 
follows 

1 / (L„-Lr^'^ 



p{L) = n 



1 \/27 

Q— 1 V 



: exp 



2al 



(40) 



where is the best measured distance for the ath pulsar and ctq, is the 1-sigma 
uncertainty on that distance measurement. 



4. Simulated data sets 



In this work we will simulate "toy model" datasets that represent realistic yet 
optimistic present day residuals. We have chosen an array of 10 pulsars that are 
meant to represent the best 10 IPTA pulsars in terms of timing precision. The datasets 
have uneven sampling, varying error bars, and time spans corresponding to the real 
pulsar observing span. The data is summarized in Table [l] To create this data we 



Table 1. Simulated IPTA pulsar datasets. The RMS values are measured from 
the data with no injected signal. The pulsar distances are taken from 46 if 
available. Otherwise the pulsar distances were taken from the ATNF catalog. 



Pulsar Name 


RMS [ns] 


Time Span [yr] 


Pulsar Distance [kpc] 


J0437-4715 


69 


14.8 


0.156 ±0.001 


J 1909-3744 


100 


9.0 


1.26 ±0.03 


J1713-I-0747 


136 


18.3 


1.05 ±0.06 


J1939-I-2134 


141 


16.3 


5.0 ± 2.0 


J1744-1134 


366 


16.9 


0.42 ±0.02 


J1857-I-0943 


402 


14.9 


0.9 ±0.2 


J1640-I-2224 


410 


14.9 


1.19 ±0.24 


J2317-I-1439 


412 


14.9 


1.89 ±0.38 


J 1824-2452 


602 


5.7 


3.6 ±0.72 


J0030-("0451 


792 


12.7 


0.28 ±0.1 



use the mean RMS from the IPTA pulsars and draw each residual from a gaussian 
distribution centered on the RMS with a standard deviation of 50% of the RMS. This 
way we are taking into account varying error bars and assuring that we only have 
gaussian white noise. We then simulate a continuous GW signal as in section [2] and 
add it to our simulated noise. Finally, in an attempt to take into account the most 
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important part of the timing model, we fit out a 2nd order polynomial from the data. 
The pulsar distances and uncertainties used in this analysis are the best measured 
values taken from [JHI if available, otherwise, we use the values from the Australia 
National Telescope Facility (ATNF) pulsar catalog and assume a 20% uncertainty. 
The rough cadence is chosen to simulate bi-monthly sampling. In order to present an 
idealistic yet plausible representation of current IPTA data sets, we have chosen to 
not include any intrinsic red noise which would only act to decrease sensitivity at low 
frequencies, therefore; the results presented here are likely to be optimistic. 

5. MCMC simulations 

In this section we wish to test the efficacy of our algorithm by injecting continuous 
GW signals into our simulated datasets described above. Although our main goal 
is to test our algorithm, we also wish to add a certain level of realism to these 
simulations. For this reason we have used mock IPTA datasets and will focus any 
astrophysical statements mostly to low SNR sources (SNR ~ 7) as this represents a 
realistic possibility in the next decade. We also include injections at higher SNR and 
mimic these injections in ideal datasets (10 pulsars timed for 10 years all with 100 
ns RMS drawn from an isotropic distribution on the sky) which have been used in 
previous parameter estimation work for PTAs [501 Ul HI] ■ 

Recent work has shown that there may be potential single GW source "hot spots" 
in the Virgo, Fornax and Coma clusters |47j . Since our purpose here is only to illustrate 
the efhcacy of our algorithm, we have randomly chosen to inject GW sources at the 
sky location corresponding to the Fornax cluster with a chirp mass of = 7 x IO^Mq 
and initial orbital period of 3.16 yr. The distance to the GW source is then scaled 
such that we achieve the desired SNR defined by 

SNR2 = ^(s(Ai„j)|s(Ainj))^, (41) 

a 

where the sum is over the number of pulsars and Ainj are the injected source 
parameters. This choice of injected parameters is justified since the amplitude of 
our GW induced residuals scales as A^^/'^w"^/^ and the stochastic GWB and other 
potential red noise sources will lower our sensitivity at lower frequencies. Therefore, 
we are likely to detect a source with high chirp mass and high frequency. See table 
|2] for a list of the different GW sources and parameters used in this work. For each 



Table 2. Simulated GW source parameters. These sources are injected at the 
sky location of the Fornax cluster and the distance is scaled such that we achieve 
the desired SNR. 



SNR 


[rad] 


ip [rad] 


^ [rad] 


L [rad] 


$0 [rad] 


M [Me] 


Dl [Mpc] 


/gw [Hz] 


7 


2.17 


0.95 


1.26 


1.57 


0.99 


7.0 X 10^ 


223.4 


2 X 10"® 


14 


2.17 


0.95 


1.26 


1.57 


0.99 


7.0 X 10^ 


111.7 


2 X 10"® 


20 


2.17 


0.95 


1.26 


1.57 


0.99 


7.0 X 10^ 


78.2 


2 X 10-8 



source, the same noise realization was used so that relative parameter accuracies do not 

f http : //www . atnf . csiro . au/people/pulsar/psrcat/ 
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depend on this specific noise realization. In general we would like to do a much more 
detailed analysis with many different noise realizations and many different injected 
sources. Indeed, this will be the subject of future work, however; here we simply 
want to test the various steps of our algorithm, that is, the search phase where we 
find the global maxima in the multi-dimensional parameter space, the sampling phase 
where we obtain samples from the underlying posterior distribution, and finally the 
evaluation phase where we compute the evidence and Bayes factors to make choices 
about detection. 



5.1. Searching for global maxima 

Since we have little information about the SMBMB population, we want to carry out 
a blind search of the parameter space making no assumptions about the underlying 
SMBMB source parameters. Therefore, it is very important that our algorithm be 
able to quickly find the global maxima of the log-likelihood function and the true 
parameters so that the sampling process can begin. The trace plots of one SNR — 20 
injection is shown in Figure [l] where we have plotted the measurable parameters 
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Figure 1. Trace plots for the measurable parameters (the inclination angle, 
initial phase and polarization angle are not well constrained for this realization) 
for an SNR=20 injection for the first 10^ steps. In all cases the black(green) line 
represents the injected parameters and the gray(blue) is the chain trace. We can 
see that the parallel tempering scheme has allowed us to locate the global maxima 
of the log-likelihood and all parameters within the first ^ 6 X 10** steps, (colour 
figures online.) 

(excluding the pulsar distance) as well as the log-likelihood as a function of chain 
iteration for the T = 1 chain. Here we do not plot the polarization angle, initial 
phase, or inclination angle as they are not well constrained by the data and contribute 
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little to the overall log-likelihood for this case. We can see from the figure that the 
algorithm has correctly found the true source parameters within the first ~ 6 x 10^ 
MCMC iterations. We note that the true value of the frequency is found quickly 
(within the first 10* steps of the algorithm) and we reach the true value of the log- 
likelihood within the first 4 x 10* steps. There are several ways that we could improve 
this step such as choosing a more suitable starting jump proposal distribution before 
starting adaptation or even starting adaptation sooner, however for the purpose of 
this work we believe that this is sufficient as the algorithm can still collect ^ 2 x 10^ 
samples with 8 chains in about 4 hours running on a 2.7 GHz quad core MacBook Pro. 
It is also important to note that in practice we will have carried out a simpler search 
algorithm such as an J^-statistic [171 UHl US] search prior to this Bayesian analysis. If 
any signal is detected, then we will have a very good idea of the frequency of the GW 
source and can therefore seed our MCMC algorithm much closer to the true value. 
Since the frequency contributes heavily to the log-likelihood, it is likely that this could 
reduce the number of samples required for this search phase by at least an order of 
magnitude. 

5.2. Sampling and parameter estimation 

For each injected source we run 4 serial chains all with 8 temperatures and starting 
positions chosen at random from the prior, thereby assuring that our algorithm can 
indeed locate the global maxima. Each serial chain was run for ~ 1.5 x 10^ iterations 
and 25% of each chain was discarded as burn in. The resulting post burn-in chains 
were then concatenated to form a single chain with ~ 4.5 x 10^ posterior samples. 

Figure [2] shows marginalized 2-D posterior pdfs of the sky coordinates {0,ip) and 
the log of the chirp mass and distance (log M, log Dl) for injected SNRs of 7, 14, and 
20 (shown from top to bottom) for a source injected at the sky position of the Fornax 
cluster. The "x" marker indicates the injected parameters and the solid, dashed and 
dot-dashed lines represent the 1, 2, and 3 sigma credible regions, respectively. The 
first thing to note from this figure is that the injected value lies well within the 1- 
sigma credible regions in all three cases. We also note that since we have injected a 
relatively high mass and high frequency GW, we can measure / and therefore; we can 
can break the degeneracy between chirp mass and distance as is seen in the plots on 
the right in the above figure. Since we know the true injected values, it is possible 
to determine how much each pulsar contributes to the log-likelihood function. For 
the aforementioned injection, four pulsars contribute more than 1% to the likelihood 
function for the SNR 7 injection and only three pulsars contribute more than 1% to 
the likelihood function for the SNR 14 and 20 injections. While this number does 
depend on the relative sky locations of the pulsars and the GW source as well as the 
specific noise realization, it is also a very strong function of the RMS of the noise in 
each pulsar ((In A) cx cr^^g). In fact, we can see the results of this in Figure [2] where 
there is a bit of multi modality in the posterior for sky position because we essentially 
only have three and four baselines (detectors) for the SNR 7 and 14 and 20 cases, 
respectively. 

This type of parameter degeneracy due to the small number of baselines differs 
from previous parameter estimation studies [20j [7l [21] where the simulated PTA 
consisted of a large number (20 or more) of pulsars all timed to the same accuracy. For 
this reason, quoted SMBHB parameter accuracies that can be obtained from PTAs 
should be interpreted cautiously as it is extremely unlikely that future era PTAs will 
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Figure 2. Marginalized 2-D posterior pdfs in the sky coordinates {6, <f>) and 
the log of the chirp mass and distance (log yV(,log -Dl) for injected SNRs of 7, 
14, and 20 shown from top to bottom. Here the injected GW source is in the 
direction of the Fornax cluster with chirp mass Ai = 7 X 10* Mq. The distance 
to the source is varied to achieve the desired SNR. Here the "x" marker indicates 
the injected parameters and the solid, dashed and dot-dashed lines represent the 
1, 2, and 3 sigma credible regions, respectively, (colour figures online.) 



even approach this ideal situation. To illustrate this point we have also simulated an 
ideal data set of 10 pulsars drawn uniformly on the sky with 100 ns RMS in each 
with baselines of 10 years. We also chose distances drawn uniformly from the range 
L G [0.5,1,5] kpc with 10% uncertainties. We have then used the same injection 
as in the simulated IPTA data at SNRs of 7, 14 and 20. The sky resolution [48| 
and fractional uncertainties on the chirp mass and distance for the simulated IPTA 
dataset are AQ = (2357.9,122.2,67.2) deg^, AM/M = (48.8%, 9.5%, 6.3%) and 
ADl/Dl = (81.2%, 28.2%, 19.9%), respectively. Whereas, for our ideal simulated 
datasets the corresponding values are Afi = (1085.9,23.7,12.8) deg^, AM/M = 
(47.9%, 4.4%, 3.0%) and ADl/Dl = (79.7%, 15.9%, 13.2%), respectively. Again, these 
results are not robust, in that we have only done one injection (with varying SNR) 
into one noise realization. Nonetheless, it should be clear that our simulated IPTA 
data do not yield nearly as precise sky resolution or chirp mass and distance fractional 
uncertainties as an ideal data set. 

5.3. Evaluating the evidence 

After we have carried out our parallel tempering MCMC search we can make use of 
the different temperature chains to calculate the evidence integral via Eq. |30[ Since 
we have measured the noise parameters before conducting our search, we use the 
log-likelihood ratio defined in Eq. [38] as our log-likelihood. By doing this we can 
compute the Bayes factor comparing our GW and noise models simply by calculating 
the evidence using the log-likelihood ratio. Figure [3] shows the log of the Bayes factor 
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Figure 3. Log of the Bayes factor plotted against injected SNR for the same 
signal and noise reahzation. The gray(green) horizontal hne is the threshold in the 
log of the Bayes factor in which we can claim a detection and the black(blue) points 
are the log Bayes factor calculated from thermodynamic integration, (colour 
figures online.) 



computed from thermodynamic integration for injections at different SNRs. Here we 
have done injections into the same noise reahzation of out simulated IPTA data using 
the same GW source (again with the distance scaled to give the desired SNR) as above. 
The MCMC sampler was run with 10 temperature chains for ^ 2 x 10^ iterations. In 
the figure, the gray(green) vertical line represents our threshold in the log of the Bayes 
factor of In 100, above which there is decisive evidence for a GW source 01] and the 
black(blue) points are the computed log Bayes factor for each injection. There are 
two important things to note. First, notice that the log of the Bayes factor is above 
the threshold for injected sources with SNR > 5 which agrees well with a frequentist 
interpretation of the SNR as a detection statistic in gaussian noise, where 5-sigma 
is usually required for a definitive detection. Secondly, as was discussed in (311, the 
Bayes factor is about unity for the zero to low SNR injections. This is because of the 
nature of the question that we are asking. In this case we are asking "Is there evidence 
for any continuous GW source in the data?". Framed in this way, the result makes 
perfect sense because a low SNR signal is nearly indistinguishable from pure noise, 
therefore the odds of a low SNR GW are about 50/50 indicated by a Bayes factor of 
1. If we were to ask the question "Is there a continuous GW source with SNR> 5 in 
the data?" , then we would expect the Bayes factor to become much less than unity at 
low SNR. 



A Bayesian analysis pipeline for continuous GW sources in the PTA band 15 
6. Conclusions and future work 

We have developed a robust MCMC algorithm that niakes use of an Adaptive 
Metropolis scheme and parallel tempering for use in PTA detection and parameter 
estimation of single sources of GWs from SMBHBs. We have tested the algorithm on 
a fairly realistic simulated IPTA dataset that has many of the features of real data 
including uneven sampling, varying error bars and overall noise levels, poor pulsar 
distance measurement uncertainty and varying data span. For comparison we have 
also run the algorithm on ideal datasets, similar to those that have been considered 
in the literature. The algorithm has shown to perform well in the three stages of our 
Bayesian analysis pipeline, namely the search, sampling and evaluation phase. When 
seeded from a random point in parameter space, the algorithm can quickly locate 
the global maxima through the use of parallel tempering. Posterior samples are then 
collected efficiently through the use of Adaptive Metropolis and special jump proposals 
in an extended parameter space. Finally, we have shown that this algorithm can 
also be used for detection through the use of parallel tempering and thermodynamic 
integration to calculate the Bayesian evidence. 

From the few simulations and comparisons of realistic vs. ideal data done in 
this work we can say that parameter estimation from current generation PTAs, 
counter to previous work on the subject, is likely to suffer due to the fact that few 
pulsars contribute to the total network SNR, resulting in a lower number of effective 
"detectors" than the number of pulsars in the array. A much more detailed study 
of the parameter estimation problem in current generation PTAs with more realistic 
noise models (including effects such as time varying Dispersion Measure) is underway 
and will be the subject of a future paper. 

In the future we hope to further modify this algorithm to incorporate intrinsic 
noise and stochastic background parameters in the search. In principle this is 
possible, however; this will result in a much larger parameter space and much more 
computationally demanding problem. To this end we plan on improving the algorithm 
to make it more efficient by implementing better initial jump proposals before the 
adaptation begins and also implementing an inter-chain adaptation scheme [SD] which 
allows for a much more efficient parallelization than running many independent chains 
in serial. Finally, we also plan to allow for multiple single GW sources in our data. 
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